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Abstract 



We provide a simple analysis of the Douglas Rachford splitting algorithm in the context 
of i 1 minimization with linear constraints, which explains why the asymptotic convergence 



> 

rate is linear. The convergence rate is given in terms of principal angles between relevant 



vector spaces, and does not depend on the shrinkage parameter. In the compressed sensing 

m 

setting, we show how to bound this rate in terms of the restricted isometry constant. More 
general iterative schemes obtained by £ 2 -regularization and over-relaxation are also treated. 
We make no attempt at characterizing the transient regime preceding the onset of linear 
convergence. 
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1 Introduction 
1.1 Setup 

In this paper we consider certain splitting algorithms for basis pursuit [5], the constrained 
optimization problem 

min ||x||i s.t. Ax = b (1-1) 

Throughout this paper we consider A 6 jj mxn with m < n, and we assume that A has full 
row rank. We also assume that the solution x* of (11. ip is unique. 

In particular, we treat splitting algorithms that naturally arise in the scope of minimiza- 
tion problems of the form 

min f(x) + g(x), 

X 

where / and g are convex, lower semi- continuous (but not otherwise smooth), and have 
simple resolvents 

J jF = (l + 1 F)-\ J, G = (l + lG)-\ 

where F = df(x) and G = dg(x) are the respective subdifferentials of / and g at x. In 
those terms, x is a minimizer if and only if G F(x) +G(x). Resolvents are also often called 
proximal operators, as they obey J y p(x) = argmin 2 f(z) + ^\\z — x\\ 2 . In the case of basis 
pursuit, it is well known that 

• f(x) = \\x\\i and g(x) = i{ X :Ax=b}, the indicator function equal to zero when Ax = b 
and +oo otherwise; 

• J 1 f is soft-thresholding (shrinkage) by an amount 7, 

J lF {x)i = S 1 (x) i = sgn(xi) max{|xj| - 7, 0}; 

• J 7 g is projection onto the set Ax = b, namely 

Jjg(x) = P(x) = x + A + (b - Ax), 

with A + = A T (AA T )~ 1 denoting the pseudo inverse. 
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The simplest splitting algorithm based on the resolvents is 

This iteration is sucessful in the special case when / and g are both indicators of convex 
sets, but does not otherwise generally enjoy good convergence properties. Instead, one is led 
to consider reflection operators R^p = 2J^ F — I, R lG = 2J lG — I, and write the Douglas- 
Rachford splitting [12], [7] 

f7/ fc+1 = If /?_.„/?_.„ 4- n?/ fc . 

(1.2) 



y k+1 = \(R lF R lG + I)y\ 

x k+l _ J^ G y*+l ) 



where / is the identity. The operator T 7 = \{R iF R 1 q + I) is firmly non-expansive regardless 
of 7 > [12] . Thus y k converges to one of its fixed points y*. Moreover, x* = J 7 g(?/*) 
is one solution to 6 F(x) + G(x). For general problems, the sublinear convergence rate 
0(l/k) was proven for averages of iterates pE]. See also [9] where a result of convergence of 
the difference between iterates is proved. Convergence questions for the Douglas- Rachf or d 
splitting were recently studied in the context of projections onto possibly nonconvex sets 

pun]. 

In the case of basis pursuit, we note that the Douglas- Rachford (DR) iteration takes the 
form 

(1.3) 



yk+l _ g^x k - y k ) + y k - x k , 

x k + l = y k + l + _ Ayk+lj 



For convenience, we use R = 2P — I to denote reflection about Ax = b, i.e., R(x) = x + 
2A + {b — Ax). It is easy to see that R is idempotent. Then T 7 = >5 7 oR + I- P. 

1.2 Main result 

In practice one often observes fast convergence for (jl.3p . For instance, see Figure [TTT1 for an 
illustration of a typical error curve where the matrix A is a 3 x 40 random matrix and x* has 
three nonzero components. Notice that the error ||7/ fc — y*\\ is monotonically decreasing since 
the operator T 7 is non-expansive. The same cannot be said of \\x h — x*\\. The iterations 
quickly settled into linear convergence of the y k 



In this example, the regime of linear convergence was reached quickly for the y k . That 
may not in general be the case, particularly if AA T is ill-conditioned. Below, we provide the 
characterization of the error decay rate in the linear regime. To express the result, we need 
the following notations. 

Assume that the unique solution x* of (II. ip has r zero components. Let (i — 1, • • • , n) 
be the standard basis in R n . Denote the basis vectors corresponding to zero components 
in x* as ej (j = i\, • • • ,i r ). Let B be the r x n selector of the zero components of x*, i.e., 
B = [e^, • • • , e« r ] T . Let Af(A) = {x : Ax = 0} denote the nullspace of A and 1Z(A T ) = {x : 
x = A T z, z G R m } denote the range of A T . 

Then, for the numerical example discussed earlier, the slope of log \\y k — as a function 
of k is cos#i for large k, where 81 is the first principal angle between M{A) and M{B). See 
Definition 12.31 in Section 12.31 for principal angles between subspaces. 
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Figure 1.1: A typical error curve for Douglas Rachford 

Our main result is that the rate of decay of the error is indeed cos 6\ for a large class of 
situations that we call standard, in the sense of the following definition. 

Definition 1.1. Consider a basis pursuit problem (b,A) with solution x* . Consider y° an 
initial value for the Douglas- Rachford iteration, and y* = lim^oo T™y°. 
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Consider the preimage of the soft thresholding of all vectors with the same signs as x* : 
Q = {S-\x) : sgn(a;) = sgn(x*)} = Q x ® Q 2 ® ■ ■ ■ Q n , 

where 

f(7,+oo), ifx*>0 
Qj = I (-00,-7), ifx) < . 
I [— 7, 7], otherwise 

We call (b,A;y°) a standard problem for the Douglas-Rachford iteration ifK(y*) belongs to 
the interior of Q, where R is the reflection operator defined earlier. In that case, we also 
say that the fixed point y* of T 1 is an interior fixed point. Otherwise, we say that (b,A;y ) 
is nonstandard for the Douglas-Rachford iteration, and that y* is a boundary fixed point. 

Theorem 1.2. Consider (b,A;y°) a standard problem for the Douglas-Rachford iteration, 
in the sense of the previous definition. Then the Douglas-Rachford iterates y k obey 

||y fc -yl<C(cos6>i)\ 

where C may depend on b, A and y° (but not on k ), and where Q\ is the leading principal 
angle between Af (A) andAf(B). 

The auxiliary variable y k in ( II. 3p converges linearly for sufficiently large k, thus x k is also 
bounded by a linearly convergent sequence since ||x fc — x*\\ = ||P(y ) — P(y*)|| = ||P(y fc — 

y*)\\ < ||y fe -y*||- 

Intuitively, convergence enters the linear regime when the support of x k essentially 
matches that of x*. By essentially, we mean that there is some technical consideration 
(embodied in our definition of a "standard problem") that this match of supports is not a 
fluke and will continue to hold for all iterates from k and on. When this linear regime is 
reached, our analysis in the standard case hinges on the simple fact that T y (y k ) — y* is a 
linear transformation on y k — y* with an eigenvalue of maximal modulus equal to cos#i. 

In the nonstandard case (y* a boundary fixed point), we furthermore show that the rate of 
convergence for y k is generically of the form cos6?i, where < 6\ < Q\ is the leading principal 
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angle between Af(A) and M{B), with B a submatrix of B depending on y*. Nongeneric cases 
are not a priori excluded by our analysis, but have not been observed in our numerical tests. 
See Section 12.51 for a discussion of the different types of nonstandard cases. 

Details and proof of the main result will be discussed in Section 2. In Sections 3 
and 4, we apply the same methodology to obtain convergence rates for two variants of 
the Douglas- Rachford algorithm, namely an over-relaxed version introduced by in [7], and 
Douglas- Rachford splitting on the ^-regularized basis pursuit. 

2 Douglas Rachford for Basis Pursuit 
2.1 Preliminaries 

For any subspace X in IR™, we use V x{z) to denote the orthogonal projection onto X of the 
point z <ER n . 

As previously, we denote F(x) = <9||x||i, G(x) = di{ x: A x =b}, and the resolvents are 
J 7 i?(a;) = S 1 (x) and J 7 g(^c) —P(x) = x + A + {b — Ax). 

Let N(x*) denote the set of coordinate indices associated with the nonzero components 

of x*, namely, N(x*) U • • • , i r } — {1, • • • , n}. Recall the definition of Q in the previous 

section. Then for any z G Q, the soft thresholding operator can be written as S 1 (z) = 

2 — 7 sgn(x*)ej — B + Bz. 

jeN(x-) 

Lemma 2.1. The assumption that x* is the unique minimizer of ( fl.ll) implies N(A) fl 
N{B) = {0}. 

Proof. For the subspace Af(B) C R™, we use x/^-B) to denote the restriction of x G R™ to 
the nonzero coordinates of x*, and 4a/"(b), a m x {n — r) matrix, to denote the restriction of 
A to the columns corresponding to nonzero coordinates of x* . 

Suppose Af(A) fl Af(B) is nontrivial, then the set {z G N(B) : A^^z = b} contains 
infinitely many points. The point z* = is the unique minimizer of 

min{||z||i : z G Af(B), A M(B) z = b}. (2.1) 
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Let d||2*||i denote the subgradient of i 1 ball \\z\\i = \\z*\\i in the subspace M{B) at 
the point z = z*. Since Af{B) is the support of x*, all the components of z* are nonzero, 
thus = {sgn(z*)} and sgn(z*) is a normal vector to a (n — r — l)-dimensional face 

of the I 1 ball ||z||i = By the first order optimality condition, we have {sgn(z*)} = 

d\\z*\\^n{Al {S) ). 

Therefore, the set {z G N(B) : Aj^^z = b} shares a common normal vector with the 
(n — r — l)-dimensional face of the (n — r)-dimensional I 1 ball. This contradicts with the 
uniqueness of the minimizer of ( 12. II) . □ 

The sum of the dimensions of M{A) and N(B) should be no larger than n since M{A) n 
N(B) = {0}. Thus, n — m + n — r < n implies m > n — r. 

J\f{A) n M{B) = {0} also implies the orthogonal complement of the subspace spanned 
by N(A) and Af(B) is TZ{A T ) n TZ{B T ). Therefore, the dimension of 7Z(A T ) n 7Z(B T ) is 
m + r — n. 

2.2 Characterization of the fixed points of T 7 

Since dt^ x ._A x =b} — H(A T ), the first order optimality condition for ( II. ip reads G 9||a;*||i + 
K(A T ), thus d\\x*\\if]TZ(A T ) ^ 0. Any such 77 G n^(A T ) is called a dual certificate. 

We have the following characterization of the fixed points of T 7 . 

Lemma 2.2. The set of the fixed points of T 7 can be described as 

{y* ; y* = x* — 777, T) G n^(A T )}. 

Moreover, for any two fixed points y* andy\, we haveyl—y%,R(y*)—R(y2) G 1Z(A T )C\1Z(B T ) . 
Thus there is a unique fixed point y* if and only ifTZ(A T ) fl 1Z(B T ) = {0}. 

Proof. For any 77 G <9||a;*||i fl TZ(A T ), consider the vector y* = x* — 777. Since Ax* = b and 
A + Ar] = r\ (implied by 77 G TZ{A T )), we have P(y*) = y* + A + (b- Ay*) = x* - 7// + A + (b - 
Ax* + A777) = x* + A + {b — Ax*) = x*. Further, 7/ G <9||x*||i implies S" 7 (x* + 777) = x*. Thus 
T 7 (?/) = S 7 (2x* - y*) + y* - x* = 5 7 (x* + 77/) - x* + y* = y*. 
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Second, for any fixed point y* of the operator T 7 , let 77 = (x* — y*)/j- Then 

P(y*) = x*, (see Theorem 5 in [7]) (2.2) 

implies r\ = A + Arj, thus 77 G 1Z(A T ). Further, y* = T 7 (y*) implies S 7 (x* + 777) = x*. We 
have x* = argmin 2 7||z||i + h\\z — (x* + 7^)|| 2 , thus 77 G <9||a;*||i. 

Finally, let y\ and y\ be two fixed points. Then y\ — y\ = —7(771 — i] 2 ) and R(t/*) — 
■^(2/2) = t(^i ~ ^2) for some 771,772 £ ^||^*||i n7?.( J 4 T ). Notice that 771,772 £ ^H^lli implies 
Vi ~ V2 E K{B T ). So we get y\ - y*, R(y*) - R(y*) G 1l(A T ) H ft(5 T ). □ 

With the assumption the matrix A has full row rank, the following condition is sufficient 
[8] and necessary [15] to ensure existence of a unique solution x* to (11. ip : 

1. those columns of A with respect to the support of x* are linearly independent. 

2. there exists a dual certificate 77 G <9||x*||i D 1Z(A T ) such that Pjvxb)^) — ^Af(B)(x*) 
and || P^(bt) (t/)!!^ < 1. 

Therefore, with assumption that there is a unique solution x* to (11.11) . there always exists 
a dual certificate 77 G <9||a;*||i (~)7l(A T ) such that ~P/s(b){v) = P^( B )(x*) and HP^^T)^) < 
1. By Lemma 12.21 y* = x* — 777 is a fixed point. And R(y*) is in the interior of Q since 
R(y*) = R(x* — 777) = x* + 777. 

We call a fixed point y* an interior fixed point if R(y*) in the interior of the set Q, or a 
boundary fixed point otherwise. A boundary fixed point exists only if lZ(A T )r\7l(B T ) 7^ {0}. 

Definition 2.3. Let U and V be two subspaces of W 1 with dim{U) = p < dim(V). The 
principal angles 9k G [0, |] (k = 1, • • • ,p) between U and V are recursively defined by 

cos8b = maxmaxM T f = uTvu, \\u\\ = \\v\\ = 1, 

uju = 0, ,uju = 0, , j = 1, 2, • ■ ■ , k — 1. 
The vectors (u±, • • ■ , u p ) and (vx, ■ ■ • , v p ) are called principal vectors. 
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Lemma 2.4. Assume y* is a boundary fixed point and y* lies on a L- dimensional face of 
the set Q. Namely, there are L coordinates ji, ■ ■ ■ ,jl such that \R(y*)j t \ =7 (I = 1, - ■ ■ ,L). 
Recall that B = [e^, • • ■ , e ir ] T , hence {ji, ■ ■ ■ is a subset of {ii, ■ ■ ■ ,i r }. Let B 1 denote 
the (r — 1) x n matrix consisting of all row vectors of B except [e^] 7 . Recursively define Bi 
as the (r — /) x n matrix consisting of all row vectors of -Ety-i) except [ej ; ] T for I = 2, • ■ ■ , L. 
Then L < dim [7£(A T ) R 7Z(B T )~^ , and the first principal angle between M{A) and N{BA 
(/ = 1, • • • , L) is nonzero. 

Proof. Let TZi (/ = 1, • • • , L) denote the one dimensional subspaces spanned by ej v then 
^(Btf-i)) = Hi®H(Bi) and Af(Bi) = 7L X ®N(B {l _ x) ). 

Let z* be an interior fixed point. Notice that ^(y*)^! = 7 and |R(,z*)j ; | < 7 for each 
I = ,L, thus P Ul [R(y*) - R{z*)\ = R{y*) jt - R{z*) jt ^ 0. By Lemma O we have 

R(y*) - R(z*) G n(A T ) n Tl(B T ), therefore 

Hi £ (n(A T )nn(B T )) ± , VZ = 1, (2.3) 

Since TZ(B T ) = TZ(Bf) © K x © • • ■ %_ x) , with (EH), we conclude that 

dim [TZ(A T ) n TL(Bf)] < dim [ll(A T ) n H(B T )] - I, V/ = 1, ■ ■ • , L, (2.4) 

thus I < dim [JZ(A T ) n K(B T )] . In particular, we have L < dim [JZ(A T ) n ft(5 T )] = 
m + r — n. 

Let M{A) U A/"(.B) denote the subspace spanned by N(A) and N(B). Since R n = 
[ft(A T ) n TZ(B T )] © U Af(B)] = \K(A T ) n ft(fl, r )] © u byflHD, we 

have dim[AT(A) UJV(Sj)] > d\m[M {A) U M {B)\ + / = dim \N{A)} + dim + / = 

dim [jV(A)] + dim[jV(Sj)]. Therefore N(A) n Af(B t ) = 0, and the first principal angle 
between M{A) and M{B{) is nonzero. □ 

2.3 The characterization of the operator T 7 

Lemma 2.5. For any y satisfying R(y) G Q and any fixed point y* , T 7 (y) — T 7 (y*) = 
[(/„ — B + B)(I n — A + A) + 5 + i?A + A](y — y*) where I n denotes the n x n identity matrix. 
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Proof. First, we have 

T 7 (y) = [S 1 o (2P — I) + 1 — P](y) = S 7 (R(y)) + y — P{y) 

= R(y)- 7 e jS gn(x*)-B + BR(y)+y-P(y) 

jeN(x*) 

= P(y)- 7 ei sgn(^)-B + SR(y). 

jGiV(x*) 

The last step is due to the fact R = 2P — I. The definition of fixed points and ( 12. 2ft imply 

S 7 (R(y*)) = a;*, (2.5) 

thus R(y*) G Q. So we also have 

r 7 (y*) = P(y*)-7 ]T e i sgn(^)-S+SR( J /*). 

Let v = y — y*, then 

T 7 (y)-T 7 (y*) = P{y)-B+BR(y)-[P(y*)-B+BR{y*)] 
= y + A + (b-Ay)-B + B(y + 2A + (b-Ay)) 

- [y* + A+{b- Ay*) - B + B(y* + 2A+(b - Ay*))] 
= v - A + Av - B+Bv + 2B + BA + Av 
= \{I n -B + B){I n -A + A) + B + BA + A}v. 

□ 

We now study the matrix T = (J n — B+B)(I n - A + A) + B+BA+A. 

Let A be a n x (n — m) matrix whose column vectors form an orthonormal basis of M{A) 
and be a n x m matrix whose column vectors form an orthonormal basis of TZ(A T ). Since 
A + A represents the projection to TZ(A T ) and so is AiAj, we have A + A = AiAj. Similarly, 
I n — A + A = A Aq . Let B and B\ be similarly defined for MiB) and TZ(B T ). The matrix 
T can now be written as 

T = BqB^AqAI + B x B\A x Al. 
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It will be convenient to study the norm of the matrix T in terms of principal angles 
between subspaces. 

Without loss of generality, we assume n — r < n — m. Let 9i (i = 1, • • • , n — r) be 
the principal angles between the subspaces M(A) and M(B). Then the first principal angle 
9\ > since -Af(A) DAf(B) = 0. Let cos O denote the (n — r) x (n — r) diagonal matrix with 
the diagonal entries (cos^i, • • • , cos^^.j,)). 

The singular value decomposition (SVD) of the (n — r) x (n — m) matrix Eq = BJAq 
is E = UocosQV T with UqUq = V T V = I( n -r), and the column vectors of B Uq and A V 
give the princpal vectors, see Theorem 1 in [2]. 

By the definition of SVD, V is a (n — m) x (n — r) matrix and its column vectors are 
orthonormalized. Let V bea(n-m)x(r- m) matrix whose column vectors are normalized 
and orthogonal to those of V. For the matrix V = (V,V), we have I< n - m ) = VV T . For 
the matrix E\ = B±Aq, consider EfEi = A^ B\B\ Aq. Since BqBq + B\Bj = I n , we have 
E\E X = A?A - AlB B?A = 7 (n _ ro) - V cos 2 &V T = (V, V) ( ^ 9 / W V'f \ 
so the SVD of E\ can be written as 

B T X A = E X = U X ( Sin n 6 r ° )v T . (2.6) 



() I{r-m) 



Notice that A = B B%A + B^Aq = B E + B X E U so we have 



AqAq = (B , Bl )(^ )(B ,B 





f cos 2 e 


cos sin 





(B U ,B 1 Ui) | 


cos sin 


sin 2 







V o 





I(r—m) 



(Bo^B^f. 



Let C denote the orthogonal complement of 1Z(A ) fl 71(B ) in the subspace 71(B ), 
namely, 7Z(B T ) = \7l(A T ) n 7Z(B T )} © C. Then the dimension of C is n - m. Let B = B U 
and By = BiUi, then the column vectors of B form an ortho normal basis of Af(B). The 
column vectors of Bi are a family of orthonormal vectors in 7Z(B T ). Moreover, the SVD 
(12. 6p implies the columns of B\ and AqV are principal vectors corresponding to angles 
{§ — 0i, • " , f — 9(n-r), 0, • • • ,0} between the two subspaces 7Z(B T ) and M(A) , see [2]. And 
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9\ > implies the largest angle between TZ(B T ) and M(A) is less than 7r/2, so none of the 
column vectors of B\ is orthogonal to M{A) thus all the column vectors of B\ are in the 
subspace C. By counting the dimension of C, we know that column vectors of Bi form an 
orthonormal basis of C. 

Let B 2 be a n x (r + m — n) whose columns form an orthonormal basis of 1Z(A T ) C\7l(B T ), 
then we have 

\ 



(B , B\, B 2 ) 



\ 



cos 2 


cos sin 








cos 6 sin 


sin 2 














I(r—m) 














0(r+m— n) 



T 



\ B l ) 



Since (Bo, Bi, B 2 ) is a unitary matrix and A\Aj = I n — AqAq , we also have 



/ 



A\A\ = (B , B ly B 2 ) 



\ 



sin 2 


— cos sin 





1 


— cos sin 


cos 2 














0(r— m) 














I (r+m—n) 



\ 



(3S\ 

T 



(2.7) 



Therefore, we get the decomposition 

T = BoB^AoAT + B 1 BfA 1 Aj 
( 



(B , Bi, B 2 ) 



\ 



cos 2 


cos sin 








— cos sin 


cos 2 














0(r— m) 














I (r+m—n) 



\~ B 1 } 



(2. 



2.4 Standard cases: the interior fixed points 

Assume the sequence y k will converge to an interior fixed point. 

First, consider the simple case when 1Z(A T ) n TZ(B T ) = {0}, then m + r = n and the 
fixed point is unique and interior. Let B a (z) denote the ball centered at z with radius a. 
Let e be the largest number such that B e (R(y*)) C Q. Let K be the smallest integer such 
that y K G B £ (y*) (thus R^^) G B e (K(y*))). By nonexpansiveness of T 7 and R, we get 
R(y fc ) G B e (R(y*)) for any k > K. By a recursive application of Lemma 12.5} we have 



T,(y k ) 



T(T 7 (y 



^k-K f„.K 



(y 1 



Vfc > K. 
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Now, flUD and TZ{A T ) n Tl(B T ) = {0} imply ||T|| 2 = cos 01. Notice that T is normal, so we 
have || T 9 1| 2 = ||T||| for any positive integer q. Thus we get the convergence rate for large k: 

\\T^y k )-y%< (cos^) fe - A '||^-y*|| 2 , \/k > K. (2.9) 

If TZ(A T ) fl 1Z(B T ) 7^ {0}, then there are many fixed points by Lemma [2.21 Let X be 
the set of all interior fixed points. For z* G X, let e(z*) be the largest number such that 
B 6( ,.)(R(z*)) C Q. 

If y K G [J B e i z *\{z*) for some K, then consider the Euclidean projection of y K to X, 
denoted by y*. Then P^mW^ - V*) = since y* - y* G ft(A T ) n TZ{B T ) for any 
2/*, 2/2 £ 2T- By (12. 8p . 7?.(A T ) fl 1Z(B T ) is the eigenspace of eigenvalue 1 for the matrix T. So 
we have ||T(y R ' — y*)|| < cos^iHy^ — 2/* II) ^us the error estimate (12.91) still holds. 

The sequence y k may converge to a different fixed points for each initial value y°; the 

fixed point y* is the projection of y K to X. Here K is the smallest integer such that y K G 

U B e{ ^(R(z*)). 
z*ei 

Theorem 2.6. For the algorithm U.S\) solving U.l\) , if y k converges to an interior fixed 
point, then there exists an integer K such that A2.9\) holds. 

2.5 Nonstandard cases: the boundary fixed points 

Assume y k converge to a boundary fixed point y* . Then y k ^ \J B £ ( z *)(z*) for any k. 

z*£l 

For simplicity, we only discuss the case where there is only one coordinate j such that 
|R(y*)j| = 7. More general cases can be discussed similarly. Without loss of generality, 
assume j = 1 and R(y*)j = 7. Then the set Q is equal to Q\Q)Q2®- • • Qni with Qi = [—7, 7]. 
Consider another set Q\ = (7, +oo)©Q2©- • • Q n ■ Any open neighborhood of R(y*) intersects 
both Q and Q\. 

There are three cases: 

I. the sequence R(y k ) stays in Q if k is large enough, 

II. the sequence R(y k ) stays in Q\ if k is large enough, 
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III. for any K, there exists k\, k 2 > K such that R(y kl ) G Q and R(y k2 ) G Q±. 

Case I. Assume y k converges to y* and R(y k ) stay in Q for any k > K. Then 
^n(AT)nn(BT){y K -y*) must be zero. Otherwise, by flZS), we have lim = P^A^n^B^ty^- 

y*) 7^ 0. By ( 12 .8[) . the eigenspace of T associated with the eigenvalue 1 is 1Z(A T ) D 1Z(B T ), 
so (EU) still holds. 

Case II Assume y k converges to y* and R(y k ) stay in Qi for any k > K. Let i? = 
[e i2 , • • • , e ir ] T . Following Lemma [2~5| for any y satisfying R(y) G Qi, we have T 7 (y)— T 7 (?/*) = 
[(/ n - B+B)(I n - A+A) + - y*). 

Without loss of generality, assume n — r + 1 < n — m. Consider the (n — r + 1) principal 
angles between J\f(A) and Af(B) denoted by ■ ■ • ,#( n _ r+ i)). Let Gi denote the diagonal 
matrix with diagonal entries (9\, ■ ■ ■ , #( n _ r+ i)). Then the matrix T = (I n — B + B)(I n — A + A) + 
B + BA + A can be written as 

\ 



(-Bo ; -Bi, B 2 



\ 



cos 2 Gi 


cos Gi sin Gi 








— cos Gi sin Gi 


cos 2 Gi 














0(r— m— 1) 














I(r+m— n— 1) 



(Bl\ 



(2.10) 



where (B , B\, B 2 ) are redefined accordingly. 

By Lemma 9\ > 0. Following the first case, we have P-ji(A T )nii(B T )(y K ~ V*) — 0- So 

\\T,(y k ) - y% < (cos^) fe ^||^ - y%, Vk > K. 



Convergence is slower than previously, as 9\ < 9\. 

Case III Assume y k converges to y* and R.(y k ) stay in Q U Qi for any k > K. Then 
V n AT)rni(BT)(y K - y*) = 0. And for y k G Q x we have ||T 7 (?/ fc ) - y*\\ 2 < (cosily* - 
y*\\ 2 . Let £> be the orthogonal complement of Tl(A T ) n TZ(B T ) in ft(A T ) D Tl{B T ), namely 
ft(A T ) n U(B T ) = 1l(A T ) n ft(E T ) © V. For y fe G Q, we have ||P c ±(T 7 (?/ fc ) - y*)\\ 2 < 
cos9 1 \\F v ±(y k - y*)\\ 2 and P c (T 7 (y fc ) - y*) = P v (y k - y*). 

For the Case III, which we refer to as nongeneric cases, no convergence results like 
\\Tj(y k ) -y*\U < (cos9i)\\y k -y*\\2 can be established since Pv{T 1 (y k ) —y*) = Pv(y k — y*) 
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whenever R(y k ) G Q. Even though it seems hard to exclude Case III from the analysis, it 
has not been observed in our numerical tests. 

2.6 Relation to the Restricted Isometry Property 

Let A be a m x n random matrix and each column of A is normalized, i.e., X^^fj = 1 f° r 

i 

each j. The Restricted Isometry Property (RIP) introduced in [3] is as follows. 

Definition 2.7. For each integer s = 1,2, •■ ■ , the restricted isometry constants S s of A is 
the smallest number such that 

(l-5 s )||a:|| 2 <Px|| 2 <(l + ^)||a:|| 2 , (2.11) 

holds for all vectors x with at most s nonzero entries. 

In particular, any vector with the same support as x* can be denoted as (/„ — B + B)x 
for some x G R n . The RIP (12. lip with s = n — r implies 

(l-5 (n - r) )\\(I n -B + B)xf < || A(I n - B + B)x\\ 2 < (1 + 5 ( „- r) )||(/„ - B + B)x\\\ Vx e R n . 

Let d denote the smallest eigenvalue of (AA T )~ 1 . Then d > since we assume A has full 
row rank. For any vector y, we have 

\\A+Ayf = y T A T [{AA T )-'] T AA T {AA T )^Ay = y T A T [(AA T )~ 1 \ T Ay > d\\Ayf, 

where the last step is due to the Courant-Fischer-Weyl min-max principle. 
Therefore, we get 

\\A + A(I n - B + B)x\\ 2 > d\\A(I n - B + B)x\\ 2 > d(l - 5 (n _ r) )\\(I n - B + B)x\\ 2 , Va; G R™, 

(2.12) 

We will show that (12.121) gives a lower bound of the first principal angle 9\ between two 
subspaces N(A) and M{B). Notice that (12.71) implies 



sin 2 6 



A + A(I n - B + B) = A X A\ B Bq = (B U , B 1 U 1 ) \ - cos 9 sin 9 
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0_ 

I (BoU^B.U^ 




by which we have \\A+A(I n - B + B)x\\ 2 = x T (B U , BJJ^ 



Let z = (B U ,BiU 1 ) T x. Since I n - B+B = [B^U^B^ 



sin 2 












I (n—r) 












(B U ,B 1 U 1 ) T x. 
{B U Q} B l U l ) T , 



(I2.12p is equivalent to 



sin 2 6 












Z > d(l- Sfn-r^Z 1 



In—r 












z, Vz e BP 



which implies sin 2 6\ > d(l — <5( n _ r )) by the Courant-Fischer-Weyl min-max principle. So 
the RIP constant gives us 



COS 6»i < yjl - d(l - 5(n-r)). 

2.7 Numerical examples 

We consider several examples for ( 11.3H . In all the examples, y° = unless specified otherwise. 
Example 1 The matrix A is a 3 x 40 random matrix and x* has three nonzero components. 
By counting dimensions, we know that 1Z(A T ) n 7Z(B T ) = {0}. Therefore there is only one 
fixed point. See Figure 11.11 for the error curve of x k and y k with 7 = 1. Obviously, the 
error \\x k — is not monotonically decreasing but \\y k — y*\\ is since the operator T 7 is 
non-expansive. And the slope of ||y fc — y*|| is exactly cos^i = 0.9983 for large k. 
Example 2 The matrix A is a 10 x 1000 random matrix and x* has ten nonzero components. 
Thus there is only one fixed point. See Figure [2TT1 for the error curve of y k with 7 = 0.1, 1, 10. 
We take y* as the result of (11.31) after 8 x 10 4 iterations. The slopes of \\y k — y*\\ for different 
7 are exactly cos#i = 0.9995 for large k. 

Example 3 The matrix A is a 18 x 100 submatrix of a 100 x 100 Fourier matrix and x* has 
two nonzero components. There are interior and boundary fixed points. In this example, 
we fix 7 = 1 and test (jl.3p with random y° for five times. See Figure 12.21 for the error curve 
of x k . In Figure [2T2l in the second, third and fourth tests, y k converges to an interior fix 



point, thus the convergence rate for large k is governed 
fifth tests, y k converges to different boundary fixed point 
than cos#i. Nonetheless, the rate for large k is still linear. 



3V cos#i = 0.9163. In the first and 
lj thus convergence rates are slower 



1 At least, numerically so in double precision. 
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5000 10000 15000 



Iteration number k 



Figure 2.1: Example 2: for an interior fixed point, the asymptotic rate remains the same for 
different soft-thresholding parameter 7. The slope of the straight line is cosf^. 




3 Related algorithms 

3.1 Dual Variable Interpretation 

The algorithm (II. 2p is equivalent to a special case of Chambolle and Pock's primal-dual 
algorithm [3]. Let w k+1 = (x k — y fe+1 )/7, then (II. 2\\ is equivalent to 

w^ 1 = (1+ ±F*)~ 1 (w k + ±{2x k -x k ~ 1 )) 



Je+l 



7 / \ 7 - 



(3.1) 



./ :-= (l + ^G)- 1 (x k --fW k + 1 ) 

where F* is the conjugate function of F . Its resolvent can be evaluated by the Moreau 
identity, 

/ . X -1 / 

x 
1 



x = i\ + 1 F)-\x)+ 1 \l+-F* 
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Let x n = ^ x k and u> n = ^2 wk i then the duality gap of the point (x n ,w n ) converges 

k=l k=l 

with the rate 0(-). See [I] for the proof. Here w k will converge to a dual certificate 

7] e d\\x*\\ 1 nn(A T ). 



3.2 Generalized Douglas Rachford 

The following generalized Douglas Rachford splitting was proposed in [7] , 

yk+i = y k + \ k [S y (2x k - y k ) - x k ] 
x k+1 = y k+1 + A+(b - Ay k+l ) 



(3.2) 



where \ k E (0, 2). 

Let T 7 A = I + A [5 7 o (2P - 1) - P]. Then any fixed point of T 7 A satisfies P(y*) = x*, 
[6]. So the fixed points set of is the same as the fixed points set of T 7 . Moreover, for any 
y satisfying R(y) G Q and any fixed point y*, T$(y) - T$(y*) = [I n + X(I n - B+B){I n - 
2A+A)-\(I n -A+A)](y-y*). 

To find the asymptotic convergence rate of (13. 2p . it suffices to consider the matrix = 
I n + \{I n - B+B)(I n - 2A+A) - X(I n - A+A) = (1 - X)I n + AT. By (J2£J, we have 



B 



cos 2 6 + (1 - A) sin 2 e 


A cos 6 sin 6 








\ 


—A cos sin 6 


cos 2 6 + (1 - A) sin 2 6 
















(1 — A)/( r _ m ) 
















I(r+m- 


-n) ) 



B 1 



where B = (B ,B U B 2 ). 

Notice that T\ is a normal matrix. By the discussion in Section 2, if y k in the iteration of 
(13. 2 p converges to an interior fixed point, the asymptotic convergence rate will be governed 
by the matrix 





f cos 2 + (1 - A) sin 2 9 


A cos 6 sin 6 





M A = | 


—A cos 9 sin 6 


cos 2 9 + (1 - A) sin' 9 







V o 





(1 — A)J(. r _ m ) 



Note that ||M A || = \/A(2 - A) cos 2 Q x + (1 - A) 2 > cosfli for any A e (0,2). Therefore, the 
asymptotic convergence rate of (I3.2p is always slower than (II. 3p if A ^ 1. 
Example 4 The matrix A is a 5 x 40 random matrix and x* has three nonzero components. 
See Figure [37T1 for the comparison of (II. 3p and (13. 2 p with 7 = 1. 




Iteration number k 

Figure 3.1: Example 4: The Generalized Douglas Rachford (13. 2p with different A and fixed 
7 = 1. The asymptotic convergence rate of (11.31) (A = 1) is the fastest. 

4 Douglas Rachford for the £ 2 regularized Basis Pur- 
suit 

Consider the regularized problem 

min{||x||i H ||x|| 2 : Ax = b}. (4-1) 

x 2a 
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It is proven in [TJ] that there exists a ctoo such that the solution of (14. ip with a > is the 
solution of (11. ip . See [UJ for more discussion of a^. For the rest of this section, we assume 
a is taken large enough so that a > a^. 

It is equivalent to solve the problem min II x II i + is x A X =b\ + tHM! 2 - To use the Douglas 

X 

Rachford splitting, in particular, we can choose F(x) = \\x\\i + ^H^H 2 an d G(x) = L{ x -.Ax=b}- 
The resolvent of F is J 7 f(^c) = argmin \\z\\i + ^\\z\\ 2 + 1| 



z — x\ 



argmin + h\\z- 

o q+7 ll ll 1 2 11 



Q+7 



X 



| 2 . Therefore, J 1 f{ 



x 



PL 

ct+7 



S 1 (x). The Douglas Rachford splitting ( 11. 2p for ( II. ip reads 
U k+1 — 7^S 7 (2x k - y k ) + y k - x k 



_ y 



k+1 + A + {b- Ay k+l ) 



(4.2) 



Since ||x||i + 2^||x|| 2 is a strongly convex function, (14. ip always has a unique minimizer 
x* as long as {x : Ax = b} is nonempty. The first order optimality condition G dF(x*) + 
dG(x*) implies the dual certificate set (<9||x*||i + ^x*) fl 1Z(A T ) is nonempty. Let 
-^-<? 7 o(2P-I)+I-P. 

Lemma 4.1. The set of the fixed points ofT® can be described as 



if :// = > : --;/.;/€ (d\\x*\\ 1 + -x* ) n'R(A 



The proof is similar to the one of Lemma 12.21 We also have 

Lemma 4.2. For any y satisfying ^-R(y) £ Q and any fixed point y* , T®(y) — T"(y*) 
[c(/ n - B+B)(I n - A + A) + cB + BA + A + (1 - c)A + A] (y - y*) where c = 

Proof. First, we have 



™ = [c5 7 o(2P-I)+I-P](y) = c5 7 (R(y)) + j/-P(y) 



- 7 e i sgn( 

iGiV(x*) 



B+BR(y) 



+ y-P(y) 



Similarly we also have 



R(y*)-7 ^sgn^-S+BR^*) 



y* - P(y*) 



20 



Let v = y — y*, then 

T«{y) - T«{y*) 



c[R(y)-B+BR{y)]+y-P(y) 

-c [R(y*) - B+BR(y*)] - (y* - P(y*)) 

c[I n - 2A+A - B + B + 2B + BA + A]v + A + Av 

[c{I n - B + B){I n - A + A) + cB+BA+A + (1 - c)A + A]v. 



□ 



Consider the matrix T(c) = c(J n - B + B){I n - A+A) + cB+BA+A + (1 - c)A+A with 



a+7 



. Then T(c) = cT + (1 - c)A+A where T = (/„ - B+B)(I n - A+A) + B+B A+A. 



By (EZD and (EH]), we have 



T(c) = (B ,B 1 ,B 2 



V 



(1 - c) sin 2 6 + ccos 2 9 


(2c- 1) cos 9 sin 9 





' 


— cos 9 sin 9 


cos 2 9 














0(r— m) 














I(r+m—n) 



\ 



T 



(4.3) 



Following the proof in [15J, it is straightforward to show there exists a dual certificate 
r] E {d\\x% + ^x*)nn(A T ) such that F M(B) (r]) = F X{B) (x*) and ||P w(B T)(»7)lloo < 1- So 



there is at least one interior fixed point. Following Lemma [2. 2\ there is only one fixed point 
y* if and only if 1Z(A T ) n TZ(B T ) = {0}. 

For simplicity, we only discuss the interior fixed point case. The boundary fixed point 
case is similar to the previous discussion. 

Assume y k converges to an interior fixed point y*. Let e be the largest number such 
that B e (R(y*)) C S. Let K be the smallest integer such that y K G B e (y*) (thus R{y K ) G 
B £ (R(y*))) . By nonexpansiveness of and R, we get R(y fc ) G S £ (R(y*)) for any k > K. So 
we have 



|y*-y*|| = ||T(c)(y*-y*)|| = ■ • • = \\T(c) k ~ K (y K -y*)\\ < ||T(c)^|| \\(y K -y*)l VA; > K 
Notice that T(c) is a nonnormal matrix, so ||T(c) 9 || is much less than ||T(c)|| 9 for large 
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q. Thus the asymptotic convergence rate is governed by lim <F/||T(c) 9 ||, which is equal to 

q— >oo 

the norm of the eigenvalues of T(c) with the largest magnitude. 
It suffices to study the matrix M(c) = 



(f - c) sin 2 6 + ccos 2 6 


(2c- 1) cos 6 sin6 


— cos sin 


cos 2 6 



because P-R.(A T )nn(B T )(y K ~ V*) — (otherwise y k cannot converge to y*). 

n—r 

Notice that det(M(c)-AI) = U [A 2 - (ccos(20;) + 1)A + ccos 2 0*]. Let A(0, c) denote the 

8=1 

solution with the largest magnitude for the quadratic equation A 2 — (c cos(20) + 1) A + c cos 2 0, 
with discriminant A = cos 2 (20) c 2 — 2c + 1. 

The two solutions of A = are [l±sin(20)]/ cos 2 (20). Notice that [l+sin(20)]/ cos 2 (20) > 
1 for G [0, 7r/2] and c G (0, 1), we have 

/<5cos0 if c > 1_sin ( 2e ) = i 

|A(0, C )| = <;:/ ' . \ . (corner (44) 



I (c cos(20) + 1 + v /cos 2 (20)c 2 -2c+ l) if c < 



(cos 8+s'm8) 2 

It is straightforward to check that |A(0, c)| is monotonically decreasing with respect to 9. 
Therefore, the asymptotic convergence rate is equal to |A(0i,c)|. 

Let c * = (cosex+sineo^ which is ec l ual to argmin c |A(0i,c)|. Let c tt = 1+2 ^ osgi which 
is the solution to |A(0i,c)| = cos0i. See Figure H~T1 Then for any c G (c", 1), we have 
|A(0i,c)| < cos 0i. Namely, the asymptotic convergence rate of (14. 2p is faster than (jl.3p if 
G (c",l). The best asymptotic convergence rate that (14. 2 p can achieve is |A(0i,c*)| = 
Vc T cos0 1 = , co l^ . = j-^r- when -S- = c*. 

v - 1 - cosyi+sin&i l+tan&i 0+7 

Example 5 The matrix A is a 40 x 1000 random matrix and x* has two nonzero components. 
This is a typical compressive sensing problem. We compare the algorithm (14. 2 p to the dual 
split Bregman method in [13] since both use £ 2 regularization and the pseudo-inverse of A. 
See Figure fl~2l for the error curve of x k . The best choice of the parameter according to Figure 
14.11 should be a/ (a + 7) = c* which is c* = 0.756 for this example. Here c* indeed gives the 
best asymptotic rate 1+t ^ ngi but c* is not necessarily the most efficient choice, as we can see 
in the Figure. On the other hand, with the same parameters, the asymptotic convergence 
rate of the dual split Bregman method seems to be between the rate |A(0i,c)| of (14. 2 p and 
the rate cos 0i of (II. 3p . 
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Figure 4.1: An illustration of ( 14.4D for a fixed 9. 




800 



Iteration number k Iteration number k 



Figure 4.2: Example 5: a = 30 is fixed. DR stands for (11 .3p and Generalized DR stands for 
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5 Conclusion 



In this paper, we analyzed the asymptotic convergence rate for Douglas Rachford splitting 
algorithm on the primal formulation of the basis pursuit, providing an explanation of fast 
convergence of such algorithms. The same analysis also applies to other variants of Douglas 
Rachford splitting. In particular, we get an explicit formula of the asymptotic convergence 
rate which might serve as a guideline for choosing parameters in £ 2 -regularized Douglas 
Rachford. 
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